2025-02-28
Install with pip install polars, we can import Polars and read in data.
Polars DataFrames have methods that mirror SQL. DataFrames are still composed of columns called Series, however, unlike Pandas DataFrames, Polars DataFrames don’t have a row index (so no need to .reset_index()).
Additionally, instead of relying on Pandas’ .loc[] property, Polars includes function-like expressions like pl.col('column_name').
The parameters for Polars’ .slice() method are the start index and the length.
We similarly don’t need to rely on Pandas’ .iloc[] property.
To reiterate, instead of using [ ] like in Pandas, in Polars we rely on expressions like pl.col().
Polars separates out .filter() and .select() that are combined in Pandas’ .loc[].
Polars is actually a query language, like SQL. So it’s not surprising to see methods with names that more closely mirror queries, like the .with_columns() method.
Joins are straightforward.
While possible with Python code generally, Polars especially embraces writing consecutive lines of code using method chaining. Note that:
( ).crm_data_pd = (customer_data_pd
.set_index('customer_id')
.join(store_transactions_pd.set_index('customer_id'))
)
(crm_data_pd
.loc[(crm_data_pd['region'] == 'West') & (crm_data_pd['feb_2005'] == crm_data_pd['feb_2005'].max())]
.assign(age = 2024 - crm_data_pd['birth_year'])
.loc[:, ['age', 'feb_2005']]
.sort_values('age', ascending = False)
.iloc[0:1]
)marc.dotson@usu.edu
github.com/marcdotson
occasionaldivergences.com